feat: helix-org prototype with MCP, prompt-driven CLI, transports (webhook/email/github), and Role/Identity split by philwinder · Pull Request #2286 · helixml/helix

philwinder · 2026-04-25T13:21:56Z

Summary

Introduces helix-org, a standalone Go prototype for a hybrid human/AI organization system. This PR is a WIP/Draft collecting the core infrastructure, three transport implementations, MCP prompts (slash commands), and a set of runnable demos.

Core platform

Model Context Protocol (MCP) Integration: All mutations flow through MCP endpoints at /workers/{id}/mcp using Streamable HTTP transport. Tool visibility is grant-filtered per worker.
MCP Prompts (Slash Commands): Server-defined prompts registered in the MCP surface alongside tools. Each prompt has a name, title, description, arguments, and a render method that produces seed messages. Grant-gated (a prompt requires a tool to be visible). Auto-generated /help command that walks the registry at render time — new prompts automatically appear without manual updates. /role command drafts a new Role from a title hint, expands to full interview template, saves via create_role, then offers edits or chains to hire_worker.
Chat Typeahead: UI dropdown showing available slash commands on every keyup in the chat textarea. Server-side expansion in the chat bridge: SendHandler intercepts /name inputs, expands them from template before sending to claude. User sees original input in their bubble; claude gets the expanded text. Enables interactive discovery and reduces friction.
Enum Schema Hints: WorkerKind and TransportKind surface as enums in the JSON Schema that MCP clients see, enabling better autocomplete. Validation errors are self-documenting: unknown worker kind "foo" (valid: "human", "ai") so clients can self-correct.
Prompt-Driven CLI: New helix-org prompt subcommand spawns Claude Code with inline MCP configuration, enabling natural-language orchestration of the entire organization graph (Roles, Workers, Positions, Streams, Grants).
Role vs Worker Split: Separates the job (Role: owner-edited markdown, fanned out via update_role) from the person (Worker: per-hire identity, immutable). Allows live edits to job descriptions without touching identities.
Environment Provisioning & Push Dispatch: Each Worker gets an isolated environment directory. When events land on subscribed Streams, the system spawns a fresh Claude Code activation (one-shot) with that worker's MCP endpoint. Role and identity are stamped into the environment; the agent reads them and acts on the event trigger.
Canonical Message envelope: Every Event.Body is a domain.Message JSON (From / To / Subject / Body / ThreadID / InReplyTo / MessageID / Extra). The spawner renders every populated field into the activation prompt so Workers branch on transport-shaped metadata directly, without a separate read_events round-trip.
Simplified Grant Model: Grants are strictly (WorkerID, ToolName) pairs with no enforcement/scope logic. A grant is the permission; the agent is trusted to comply.

Transports

Streams own their I/O. Three transport kinds, each behind its own package:

Local (default): in-process pub/sub between Workers.
Webhook: bidirectional HTTP. Outbound POSTs to a configured URL on every published event; inbound deliveries are HMAC-verified and fanned out to subscribed Workers. Demo: secretary worker bridges an external webhook to internal channels.
Email (Postmark): outbound via Postmark API; inbound via Postmark's webhook with alias-based stream routing. Demo: two-worker email exchange (Sam <-> Lee).
GitHub (inbound only): single /github/webhook endpoint, HMAC-verified via X-Hub-Signature-256, fans out to every Stream whose repo + events whitelist matches. Acting on a repo (label, comment, review, open PR) is the Worker's job via gh in its Environment; publish on a github stream returns a loud error. Demos: doc-engineer reviews docs PRs and tags docs issues; github-engineer implements features on a GitHub Project v2 board.

Operational config

DB-stored, redacted-by-default: provider credentials live in transport.<kind> keys with explicit Secrets: []string declarations. helix-org config get redacts every declared secret; regression tests pin the spec for both transport.postmark and transport.github so a future refactor can't silently drop a redaction entry.

Design Philosophy

Data/text over code: Configuration lives in Role markdown and prompts, not Go logic.
Keep core generic: Tools define their own scope and schemas; new tools are addable without core changes.
No workflow in code: Orchestration logic lives in Role prompts, not implicit chains in the codebase.
Smallest thing that works: No speculative abstractions.

What's Inside

domain/: Core types (Role, Worker, Position, Stream, Grant, Event, Message, Transport) + enum validators
prompts/: Prompt interface, Registry, builtins (/help, /role)
store/sqlite/: GORM-driven SQLite with AutoMigrate (no raw SQL migrations)
tools/: 13 MCP tools + spawner + registry + JSON schema enum hints
server/: HTTP endpoints for reads + MCP mutation handler + jsonapi.org serialization + chat bridge with slash expansion
cmd/helix-org/: CLI with serve, bootstrap, chat, config subcommands
broadcast/ & dispatch/: Event bus for push-based worker activation
transports/postmark, transports/github: provider-specific I/O packages
demos/: getting-started, newsroom, webhook, email, github, github-engineer - runnable end-to-end
design/: design docs for the canonical envelope, the email transport, the github transport

Testing

All code is tested end-to-end:

Bootstrap -> role create -> worker hire -> event publish -> worker activation with MCP -> live-edit role -> behavior change
Prompt registry auto-generation (Help sees new prompts registered after it)
Chat slash expansion and typeahead filtering
Enum schema and validation error formatting
Transport unit tests for HMAC verification, payload mapping, redaction
make check passes: 0 lint issues, race detector clean

Next Steps (Post-WIP)

Add persistent authentication (currently all callers are treated as root owner)
Move provider credentials to per-Worker scope so different teams can use different GitHub identities / inboxes
Extend to support human operators at the REPL
Integrate with the broader Helix platform

WIP because: the core prototype is complete and tested, but we're still validating the design with the broader team before finalizing the API surface and documentation.

Co-Authored-By: Claude Haiku 4.5 noreply@anthropic.com
Co-Authored-By: Claude Opus 4.7 (1M context) noreply@anthropic.com

Update — domain/runtime split + unified Helix session shape

Helix-specific Worker fields moved off domain.Worker to a sidecar WorkerRuntimeState keyed on (workerID, backend, key). Six methods dropped from the domain interface.
Runtime layer moved out of tools/: new agent/, agent/claude/, agent/helix/ packages plus helix/helixclient/. tools/ now holds only org-graph MCP tools.
SpecsPublisher -> agent.WorkspaceSync. Logical-name contract (role.md, identity.md); each backend translates to its own layout. Fixes the prior path mismatch where update_role wrote job/* but the activation mandate read .context/*.
agent.md moved from tools/templates/ to agent/policy.md and embedded as agent.Policy so both runtimes share one source.
Unified Helix session shape: helix.Runtime (zed_agent) and helix.AgentType (zed_external) are non-configurable constants used by every project apply and every /sessions/chat post. Drops chat.agent_type config key and the Runtime fields on the spawner/applier so the spawner and chat backend can no longer drift to claude_code.

Verified end-to-end against app.helix.ml (getting-started demo).

Demos

The PR now includes seven runnable end-to-end demos:

getting-started — bootstrap, hire echo worker, publish/read events, live edit role.
webhook — inbound/outbound webhook transport, secretary summarizes and forwards.
email — bidirectional Postmark, two-worker support escalation with threading.
newsroom — multi-worker publishing pipeline (editor, fact-checker, publisher).
github — GitHub webhook inbound, multiple workers acting on issues/PRs via gh CLI.
github-engineer — GitHub Project v2 board worker implementing features spec-style.
manufacturing — NCR triage with Helix backend + comms-demo mock-channels: operator raises NCR → agent fans out (Slack/SMS/Email) → supervisor approves → agent confirms. Shows the hold pattern and the agent/human split.

Notes for reviewers

Manufacturing demo is the newest and was verified end-to-end against app.helix.ml:

Uses Helix-backed spawner + chat (not local claude).
Three webhook streams (supervisor DM, customer SMS, supplier email).
Role file bakes reference data (SPC, maintenance log, related NCRs, affected orders) so no external systems needed.
Two agent activations: NCR raised → fan out; supervisor reply → confirm & conditional send.
~90 seconds on stage, pre-flight & setup ~5 minutes.
Demonstrates the core value: agent assembles evidence and drafts; humans make three decisions (not chase data across seven systems).

All demos pass make ci (formatting, lint, race tests).

…ity split Adds a complete proto-implementation of helix-org as a standalone Go project with: - **MCP Integration**: All mutations flow through Model Context Protocol at /workers/{id}/mcp using Streamable HTTP transport. Tool list is grant-filtered per worker. - **Prompt-Driven CLI**: New `helix-org prompt` subcommand spawns Claude Code with inline MCP config, enabling natural-language orchestration of the entire org graph. - **Role vs Worker Split**: Roles are job descriptions (owner-edited markdown, fanned out via update_role). Workers are people in positions (per-hire identities, immutable). - **Environment Provisioning**: Each Worker gets an isolated environment directory with: - role.md (propagated via update_role) - identity.md (per-hire, immutable) - agent.md (fixed stub: "Read role.md and identity.md, act on trigger") - mcp.json (dynamically generated per activation) - **Push-Dispatch Event Loop**: When events land on subscribed channels, the system spawns a fresh Claude Code instance (one-shot activation) with that worker's MCP endpoint. - **channel_members Tool**: Read-only MCP tool that lists workers subscribed to a channel, enabling Workers to query org membership without side effects. - **Simplified Grant Model**: Grants are now strictly (workerID, toolName) pairs. Removed enforcement/scope entirely—a grant IS the permission, and the agent is trusted to comply. - **Humanized Demos**: Getting-started and newsroom demos now use prompt-based CLIs with natural-language orchestration instead of raw API calls. Major components: - domain/: Core types (Role, Worker, Position, Channel, Grant, Event) - store/sqlite: GORM-driven SQLite storage with AutoMigrate - tools/: 13 MCP tools (create_role, hire_worker, etc.) + spawner - server/: HTTP endpoints + MCP handler + jsonapi.org serialization - cmd/helix-org: CLI with serve, bootstrap, prompt subcommands - broadcast/dispatch: Event bus for push-based activation - demos/: Two runnable examples (getting-started, newsroom editorial team) Design principles embedded: - Prefer data/text over code (config in Role markdown, not Go) - Keep core generic (tools define their own scope and schemas) - No workflow in code (agents orchestrate via prompts, not implicit chains) - Write smallest thing that works (no speculative abstractions) All code tested end-to-end: bootstrap → role create → worker hire → event publish → worker activation with MCP → live-edit role → behaviour change on next activation. Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>

A minimal three-Worker demo that produces an opinionated MLOps newsletter with a fresh angle each issue. Shows the prompt-driven philosophy at its tightest: - Only files on disk are 3 short role markdown files (~25 lines each) - A single helix-org prompt call creates the roles, positions, channels, and hires the team - Editor picks the angle, researcher hunts for matching news, journalist crafts the narrative - Re-run with a different brief and the same team produces a completely different angle on the same broad subject Tested end-to-end: two briefs produced two distinct angles ("platform team tax" vs "feature stores as MLOps' open secret graveyard") with named subjects (Stitch Fix, Chime, Modal Labs, Tecton) — proving the angle truly varies per brief. Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>

Adds a new \`helix-org tail [glob...]\` CLI plus the \`GET /tail\` endpoint it talks to. Lets the human watch the cascade of a running team in real time without curl + jq incantations. - Defaults to '*' (all channels). Globs use Go's path.Match: 'c-*', 'c-news?', 'c-newsletter'. Multiple globs unioned. - Long-polls (default 30s wait, configurable via --wait). - Pretty output: HH:MM:SS channel source body, with subsequent body lines indented under the body column. ANSI colour when stdout is a TTY; --no-color to disable. - New broadcast.Broadcaster.SubscribeAll for wildcard wakes, so channels created mid-tail (e.g. by an editor's hire trigger) also wake the tail loop. - New store.Events.ListSince(channelIDs, since, limit) returning oldest-first events strictly newer than the named event. - URL surface designed to extend: bare globs are channel IDs today; future namespace prefixes (channel:c-*, activation:w-*) can be added without breaking compatibility. Tested: store + broadcaster unit tests, server endpoint test covering glob match, since cursor, and default match. Live-tested against the running mlops-newsletter demo (history backfill, live event arrival via long-poll, multi-glob union). Newsletter README updated to use \`helix-org tail\` instead of curl. Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>

Both demos previously asked the user to either tail per-Worker activation.log files or curl the channel events endpoint. Replace both with helix-org tail: - newsroom: drop "tile seven terminals" instruction in favour of one tail window (default '*' = all channels). Recommend per-channel globs (tail c-bullpen, tail c-recruiting) for narrower focus. "What to point at during the demo" callouts now name the exact tail command to run. - getting-started: replace tail -f activation.log + curl-and-jq round-trip check with helix-org tail. Keep activation.log as a parenthetical for debugging the worker's internal claude stream. Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>

…h Transport extensibility ## Abstraction Simplification - **Channel → Stream**: Unified the Channel concept into Stream, removing redundant abstraction. Streams now hold the single named pub/sub channel. - **Stream → Subscription**: Renamed the worker-channel edge from Stream to Subscription using a composite key (worker_id, stream_id). This eliminates synthetic stream IDs and clarifies the semantic: a subscription is a worker's interest in a stream, not the stream itself. - **Transport Field**: Added optional Transport field to Stream to support future integrations (Slack, email, webhook, RSS, tick). Defaults to "local" (in-process pub/sub). Designed to be extensible without core changes. ## Architecture Changes ### Domain Layer (domain/) - Added `transport.go`: Transport struct with Kind (enum) and optional Config (json.RawMessage) - Added `subscription.go`: Subscription struct with WorkerID, StreamID, CreatedAt (composite key, no synthetic ID) - Updated `stream.go`: Renamed from Channel; now holds ID, Name, Description, CreatedBy, CreatedAt, Transport - Updated `event.go`: Changed ChannelID field to StreamID - Updated `id.go`: Removed ChannelID type ### Store Layer (store/sqlite/) - Added `subscription.go`: Subscriptions repository with Create, Delete, Find, ListForWorker, ListForStream - Updated `stream.go`: Renamed from channel.go; added TransportKind and TransportConfig columns - Updated `event.go`: Changed column references from channel_id to stream_id; JOINs on subscriptions instead of streams - Updated `streams_and_events_test.go`: Renamed from feed_and_channels_test.go; comprehensive test coverage for new abstractions - Updated `store.go`: Renamed Channels → Streams; replaced Streams → Subscriptions ### Broadcast & Dispatch (broadcast/, dispatch/) - Renamed all channelID references to streamID throughout - Updated method signatures to use StreamID instead of ChannelID ### Tools Layer (tools/) - Added `create_stream.go`: New tool taking optional transport argument - Added `read_events.go`: Replaces read_feed.go; queries subscriptions then long-polls streams - Added `read_*.go` (streams, grants, positions, roles, workers): MCP tools replacing HTTP read endpoints - Updated `subscribe.go`, `unsubscribe.go`, `publish.go`: Use streamId and Subscriptions API - Renamed `channel_members.go` → `stream_members.go`; calls Subscriptions.ListForStream - Updated `spawner.go`: Trigger struct uses StreamID; updated event notification text ### Server & HTTP (server/) - Moved all read endpoints to MCP tools; `/workers/{id}/mcp` now handles mutations only - Updated `tail.go`: Long-poll attributes renamed to streamID; calls store.Streams.List - Simplified `server.go`: Only MCP mutation handler and tail endpoint remain - Deleted: bootstrap.go, channels.go, environment.go, feed.go, grants.go, positions.go, roles.go, workers.go ### Bootstrap & CLI (bootstrap/, cmd/) - Updated default tool grants to reference new tool names - Updated vocabulary throughout: c- prefix → s- prefix for stream IDs ### Demos (demos/) - Updated all demo READMEs and role definitions from channel to stream vocabulary - Added `mlops-newsletter/hire.txt`: Example hire prompt ## Benefits 1. **Clearer semantics**: Stream is what it says (a named pub/sub channel), Subscription is the worker's interest in it 2. **Extensibility**: Transport field allows future integrations without core changes 3. **Reduced complexity**: No synthetic stream IDs, no redundant Feed/Channel/Stream layers 4. **MCP-first design**: All mutations now routed through MCP, read endpoints are MCP tools 5. **Smaller server surface**: HTTP endpoints only for authentication + tail streaming ## Testing All 57 test cases pass with race detector enabled across all packages: - domain: Subscription and Transport validation - store/sqlite: Subscriptions repository operations, stream queries with JOINs - broadcast: Pub/sub with streamID - server: Tail long-poll with stream glob matching - tools: All 13 MCP tools with varied schemas Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

The `/tail` HTTP long-poll endpoint and `helix-org tail/prompt/client` CLI subcommands are now unnecessary: all human observation and orchestration flows through MCP via `claude` sessions directly. **Removals:** - Delete server/tail.go (HTTP long-poll handler) - Delete server/jsonapi.go (only used by tail) - Delete cmd/helix-org/tail.go (CLI client) - Delete cmd/helix-org/prompt.go (spawner stub) - Delete cmd/helix-org/client.go (envelope types) - Remove mux route for GET /tail - Remove Broadcaster.SubscribeAll/UnsubscribeAll (dead after tail removal) - Simplify serve/bootstrap doc: "one HTTP endpoint: /workers/{id}/mcp" **Updates:** - demos/getting-started/README.md: replace helix-org tail with claude watcher prompt using subscribe + read_events(wait=60) - demos/mlops-newsletter/README.md: same pattern - demos/newsroom/README.md: same pattern, plus add recruiter role "On hire" trigger to handle stream race condition - CLAUDE.md: clarify that human observation uses MCP (no /tail endpoint) - tools/publish.go: comment fix **Fixes:** - cmd/helix-org/bootstrap.go: make installClaudeMCPEntry idempotent by removing stale entry before adding (re-running bootstrap between demo wipes no longer fails) - demos/newsroom/roles/recruiter.md: add "On hire" subscribe + retry guidance matching researcher/journalist (Renée was getting hired before Maya's hire activation created s-recruiting) All three demos tested end-to-end: bootstrap → scaffold → hire cascade → event publishing → role live-edit → behavior change confirmed. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Add helix-org chat — an interactive claude session pointed at a Worker's MCP endpoint (default w-owner). Supports --new, --resume, --worker flags, and session persistence via claude's per-cwd store with --continue. Update all three demos to show only the interactive chat flow: - getting-started: condensed from two-terminal to one, removed --install-claude-mcp, Bootstrap → chat → type prompts as w-owner - mlops-newsletter: removed separate watcher terminal, team setup and brief publishing now happen inline in chat - newsroom: removed multi-terminal watcher, all interaction happens in the bootstrap + chat session Demos now focus on the actual user experience (typing into a chat) which mirrors a real UI-based server. Removed background concepts, multi-terminal complexity, and one-shot (-p) mode from demos. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

helix-org chat unconditionally passed --continue, so the first run in a fresh directory exited with "No conversation found to continue" before the user could type anything. Probe ~/.claude/projects/<encoded-cwd>/ for any .jsonl session file and only pass --continue when one exists; otherwise let claude start fresh, which still seeds a session for the next run to resume. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Replace claude's --continue flag with --resume <sessionId>, looked up by reading the most-recently-modified .jsonl in the cwd's session store and parsing the sessionId from its first line. --continue rejects sessions whose log ended on certain non-user events (e.g. an agent-name marker from a prior interrupted exit), failing with "No conversation found to continue" even when the session is fine to resume by ID. This blocked re-entry into chat in the demo directories whenever a previous chat had exited mid-flight. If no prior session exists, claude is launched without a resume flag and starts fresh — matching the desired first-run behaviour. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Adds two new MCP tools for worker-to-worker communication: - dm: High-level tool bundling create_stream + invite_workers + publish into a single call. Creates per-pair streams with deterministic naming (s-dm-<sortedIDs>) so conversations reuse the same stream regardless of direction. Complements lower-level streaming tools with a high-level, autonomously-discoverable entry point. - invite_workers: Subscribes one or more workers to a stream in a single call. Idempotent — re-inviting already-subscribed workers is a no-op. Enables batch subscription workflows without manual loop. Both tools are granted to the owner during bootstrap and tested end-to-end (dm stream reuse across directions, idempotency, self-DM rejection, unknown worker rejection). Updated demo: newsroom step 6 now uses dm instead of manual 4-step workflow, and updated comments in publish/subscribe to point to dm as the high-level entry point. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Replaces on-disk activation.log/jsonl files with a per-Worker activation Stream. Assistant text, tool calls, tool results, and lifecycle markers are now Events on s-activations-<workerID> — same primitive as every other read in the system. - hire_worker creates the activation Stream at hire time and subscribes the hiring Worker. The new Worker themselves is intentionally NOT subscribed (would loop the dispatcher otherwise). - Spawner publishes one Event per atomic message segment (assistant text, tool_use, tool_result, system init, run result), bracketed by synthetic '=== activation: <trigger> ===' and '=== exit: <err> ===' markers. Append + Notify only — the dispatcher is skipped so per- message events can't re-trigger subscribed AI Workers. - worker_log tool bundles subscribe + read_events scoped to one Worker's activation Stream. Mirrors the dm pattern: a friendly shortcut the agent can reach for from a 'show me what w-X is doing' instruction without knowing the stream-naming convention. Persistence between activation runs is left to the Role: if a Worker needs cross-run memory, the Role tells it to write to history.md and read it back on the next activation. No system feature added. Demos updated to showcase the new affordances: - getting-started: step 3 uses worker_log to confirm hire activation finished, eliminating the cross-terminal log-watching requirement. - mlops-newsletter: step 4 adds a peek-inside tip using worker_log. - newsroom: adds a 'Watch a Worker work' step parallel to the dm step, plus a 'What to point at' bullet for fact-checker blocks. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Adds inbound webhook support to helix-org Streams. Each Stream can declare transport.kind="webhook"; POST requests to /webhooks/<streamID> append the request body as an Event, trigger the dispatcher to wake subscribed Workers, and notify long-poll observers. Key changes: - domain/transport.go: add TransportWebhook kind with docstring - server/server.go: add Dispatcher interface, update New() signature - server/webhook.go: HTTP POST handler for /webhooks/{streamID} - server/webhook_test.go: 9 test functions covering edge cases and concurrency * happy path, missing stream, wrong transport, empty body * size limits, nil broadcaster/dispatcher, UTF-8 handling * 25 concurrent POSTs, stream isolation * race-detector clean with -count=20 Also fixes critical :memory: SQLite concurrency bug: - store/sqlite/sqlite.go: pin MaxOpenConns(1) for in-memory databases - Root cause: each connection gets its own private :memory: DB - Impact: concurrent HTTP tests now see consistent state New demo: - demos/webhook/README.md: 5-step specification (hire secretary, POST payload, read back) - demos/webhook/roles/secretary.md: secretary subscribes to s-inbox, summarizes incoming payloads, DMs summaries to owner Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Extends the webhook transport so a Stream can be configured to POST each appended Event to an external URL. A Stream can now be inbound- only (current behaviour, no config), outbound-only (config sets outbound_url), or both at once — the dispatcher fires emit on every append regardless of origin (webhook handler, publish tool, dm tool). Key changes: - domain/transport.go: WebhookConfig type with OutboundURL field; Validate now parses webhook config and rejects non-http(s) URLs, relative URLs, and empty hosts before stream creation - dispatch/dispatcher.go: emitOutbound runs on every Dispatch, looks up the Stream's transport, and if outbound_url is set fires an async POST with X-Helix-Stream and X-Helix-Event headers; bounded by 5s timeout so slow targets don't stall publishes - domain/transport_test.go: 14 cases covering Validate happy paths and rejection paths, plus WebhookConfig parse round-trip - dispatch/dispatcher_test.go: 12 tests covering emit happy path, inbound-only no-emit, local-no-emit, missing stream, 4xx/5xx tolerance, unreachable host, slow target timeout, 25 concurrent emits, binary payload round-trip, malformed stored config, store lookup errors, and content-type/path preservation - server/webhook_test.go: TestWebhookBridgesInboundToOutbound wires the real dispatcher end-to-end and proves an external POST to /webhooks/<streamID> bridges to an outbound POST when the same stream has both directions configured Demo narrative updated: secretary now subscribes to s-inbox, DMs the owner with the summary, and publishes the summary to s-outbox which is configured with outbound_url. A 4-terminal flow with a local nc catcher shows the full inbound -> summarise -> outbound bridge. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Adds domain.Message — a transport-agnostic envelope (From, To, Subject, Body, ThreadID, InReplyTo, MessageID, Attachments, Extra) — and migrates every event-producing path to encode it as JSON in Event.Body. There is one storage shape going forward; future transports (email, Slack, queues, feeds) translate at their boundary, Workers see the same structure regardless of source. Identity convention: From/To carry transport-native identifiers verbatim (WorkerIDs when known, alice@x.com / U0123 / +15551234 / etc. otherwise — no prefixes). Empty From means "no human originator" for data feeds and triggers. Code changes: - domain/message.go: Message + Attachment types, Encode/Decode helpers, Event.Message() parser, NewMessageEvent constructor - tools/dm.go: produces Message{From: caller, To: [recipient], Body} - tools/publish.go: accepts optional to/subject/threadId/inReplyTo/ messageId/bodyContentType/attachments args; defaults From=caller - server/webhook.go: wraps inbound POST bodies into Message{Body: raw} - tools/spawner.go: activation log entries wrapped as Message{From: workerID, Body: line}; Trigger gains a Message field - dispatch/dispatcher.go: parses Event.Body once, passes parsed Message and visible Body text to the spawner - tools/read_events.go: surfaces Message.Body as `body` (visible text) and the full envelope as `message` — Roles needing structure read the latter; existing role prompts that read `.body` continue to work Tests updated to use Event.Message() instead of comparing raw Body strings; full make check passes (lint clean, race detector clean). Demos verified end-to-end after the refactor: - getting-started: hire echo worker, publish "hello", echo replies, live-edit role, "loud: HELLO" — all four steps green - webhook: secretary summarises inbound POST, DMs owner, publishes to s-outbox, outbound emitter POSTs Message JSON to nc:9000 catcher (catcher now sees structured envelope, not raw text — README updated to describe this) - mlops-newsletter: full editor → researcher → journalist → editor cascade produces a complete newsletter on s-newsletter - newsroom: 7 roles, 2 positions, 2 hires (Maya + Renée), all activations clean — message machinery validated without running the real-PR cascade Design doc at design/messages.md captures the convention, the per- transport mapping table for future transports, and open questions to resolve as new transports ship. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

@Domain

Implements the email transport, the operational-config infrastructure it sits on, and a runnable customer-service demo (Sam) that emails land at and reply through. Verified end-to-end: simulated inbound POST → +sam alias routed → Sam's claude activation → reply published to s-support → outbound emit POSTed to Postmark's /email API → real email delivered to phil@winder.ai. ~22s wall-clock end-to-end on a cold activation. Operational config (design/config.md): - New configs table (key/value/audit), store.Configs interface, sqlite impl. Auto-migrated alongside the rest. - config.Registry: subsystems Register a Spec (type, default, required, secret paths, description). Reads/writes go through it so the CLI's view matches what consumers actually consume. - helix-org config CLI: set/get/list/delete. Opens the SQLite file directly (same path as bootstrap), so config writes commit and the running server picks them up on its next read — live updates without restart, and without an LLM ever touching the values. Secrets redacted by default; --reveal-secrets opts in. - Strict separation: org-graph mutations stay on MCP; operational config (transport creds, future model selection, etc.) is CLI-only. Same SQLite file, two access paths, two threat models. Email transport (transports/postmark): - domain.TransportEmail kind + EmailConfig{Alias} stream config. Validate enforces lowercase alphanumeric/dash/underscore aliases so they compose safely into <hash>+<alias>@... or <alias>@Domain. - Inbound HTTP handler at /email/postmark: parses Postmark's JSON, extracts the +alias suffix from OriginalRecipient, finds the matching Stream by alias, builds a domain.Message envelope (From, To, Subject, Body, MessageID, InReplyTo, ThreadID from headers, Attachment metadata), appends the event, fires the dispatcher. - Outbound emitter: when a Worker publishes to an email Stream, the dispatcher invokes the transport's Emit, which composes a Postmark /email POST (From=server-config, To from Message.To, optional Reply-To at <hash>+<alias>@... for threading, In-Reply-To/References headers when set). - Server-level config (token, inbound, from, optional disable_reply_to) lives in transport.postmark; per-stream config is just {"alias":"sam"}. The transport joins the two at runtime, so rotating creds is one CLI call with no restart. - disable_reply_to flag: workaround for Postmark's pending-approval same-domain restriction (Reply-To at inbound.postmarkapp.com is treated as a cross-domain recipient and blocks the send). With it on, outbound works but customer replies won't loop back into helix until the account is approved — documented in the demo README as the path to closing the loop. Dispatcher loop guard: - Skip outbound emit when event.Source == "" (system-emitted, i.e. inbound from this transport's own webhook). Without this, a bidirectional Stream (one alias, both inbound and outbound) would echo every inbound message straight back out to itself. Worker-published events (Source != "") still emit normally. - Replaced TestWebhookBridgesInboundToOutbound with TestWebhookInboundDoesNotEcho to lock the new behaviour in. Server: - Server.Handler now takes optional Routes so transports can mount their own inbound endpoints without server.go importing them. The email transport's /email/postmark gets mounted from cmd/helix-org/serve.go. Demo (demos/email): - README.md walks through the whole flow: signup → server token → Sender Signature → inbound hash → cloudflared/ngrok tunnel → Postmark InboundHookUrl → helix-org config set transport.postmark → bootstrap → hire Sam → send a real email. Includes the pending-approval workaround and the path to closing the customer-reply loop once approved. - roles/customer-service.md: Sam reads inbound, drafts a 2–4 sentence reply, escalates rather than fabricates, signs off '— Sam' on its own line. - workers/sam.md: identity stub (real first name, no brand voice, knows when he doesn't know). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Updates the email demo to show two workers — customer service (Sam, alias=sam) and engineering (Lee, alias=engineer) — handling a customer query that requires escalation. Every leg of the four-hop cascade goes through Postmark; both Streams are bidirectional; threading via Message-Id stitches the whole thing into one logical conversation. Verified e2e in ~2:15 wall-clock: customer → Sam (Postmark inbound → s-support) Sam → Lee (Postmark send + inbound → s-engineer) Lee → Sam (Postmark send + inbound → s-support, [eng] prefix) Sam → customer (Postmark send → real inbox) Three Postmark sends, all returned status=200; same ThreadID flowed through every event. Changes: - demos/email/roles/customer-service.md: Sam now branches on Subject. `[eng]` prefix means Lee replied → walk s-support history by ThreadID to find the customer's original query, then reply to that customer with a paraphrased version of Lee's answer. Otherwise it's a customer query → answer directly when simple, forward to <hash>+engineer@inbound.postmarkapp.com when technical. ThreadID preservation is critical for the lookup. - demos/email/roles/engineer.md (new): Lee subscribes to s-engineer, drafts 3-6 sentence technical answers, replies to Sam at the +sam alias with `[eng] Re:` subject prefix and preserved ThreadID. - demos/email/workers/lee.md (new): identity stub. - demos/email/README.md: rewritten "Run the demo" section for the two-worker flow. Adds an explicit `<INBOUND_HASH>` sed substitution step (workers know each other's addresses via role text). Drops the disable_reply_to workaround now that the Postmark account is approved. New "What this shows" bullets call out workers-as-email-participants and ThreadID-as-spine. - demos/email/demo.cast: re-recorded asciicast of the four-hop cascade. The mp4 (demos/email/demo.mp4) is regenerated locally but stays gitignored, same convention as demos/getting-started/demo.mp4. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Previously the activation prompt only carried Body. The Worker had to call read_events to learn Subject, From, ThreadID, Extra — exactly the round-trip that caused the docs-engineer to misroute issue #3 to PR #2 during the github demo's E2E run. renderTrigger now formats every populated envelope field into the prompt, omitting empties for cleanliness. The Trigger.Body field is dropped; callers pass the full Message instead. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

GitHub POSTs to a single /github/webhook endpoint; the transport HMAC-verifies via X-Hub-Signature-256 against the installation's webhook_secret, then fans the delivery out to every Stream whose Config.Repo matches repository.full_name and whose Config.Events whitelist contains the X-GitHub-Event header value. Inbound only — acting on a repo (label, comment, review, open PR) is the Worker's job via gh in its Environment. publish on a github stream returns a loud error rather than silently no-op'ing. The Message envelope is mapped from the upstream payload verbatim: Subject = issue/PR title, Body = body, ThreadID = "#<number>", MessageID = X-GitHub-Delivery, From = sender.login, Extra = the full payload with one synthetic top-level "event" key injected from the X-GitHub-Event header so Workers can branch on event type from Extra alone. Per-stream config is just routing identity (repo, events). Provider credentials (token, webhook_secret) live in server-level config under transport.github with both fields registered as Secrets so config get redacts them. Regression tests pin both names against silent leaks. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Walkthrough demo of the doc-engineer role: spin up a real cloudflared tunnel, register the webhook, hire the Worker, then exercise the issues + pull_request + pull_request_review + issue_comment paths against a live GitHub repo. README narrates each step; demo.cast is the asciinema recording. Design doc covers the identity model (no machine user; gh auth token gives the engineer the operator's own identity for now), the inbound- only decision, the message envelope mapping, and the operational config / setup-via-chat flow. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Move Role.Content and Worker.IdentityContent from disk-based markdown files (role.md, identity.md) into the SQLite domain, enabling future evolution to remote workspaces and eliminating hardcoded filename coupling. ## Key changes - Domain: Worker interface now exposes IdentityContent() string method; both HumanWorker and AIWorker carry immutable identity field. Constructor signatures updated to accept identity content at hire time. - Store: Added Update(ctx, worker) method to Workers interface, implemented via GORM with identity_content column in worker table. - Tools: - update_role: Simplified to single DB write (removed 50-line fanOut loop). - update_identity: New tool, mirrors update_role's shape. - hire_worker: Creates DB records only; no env files at hire time. - spawner: Added projectEnv() function that lazily writes role.md, identity.md, agent.md to env at activation time, reading from DB. - Bootstrap: Seed owner Worker with starter identity text; grant UpdateIdentityName. - UI: Added /ui/org org-chart master-detail view. handleOrgIdentitySet() now calls Workers.Update() instead of WriteFile(). Removed disk path tracking. - Tests: Updated 12+ call sites with identity parameter; rewrote TestUpdateRoleFanOut as TestProjectEnvWritesCanonicalState to verify lazy-projection contract. ## Why Hardcoded filenames across hire_worker, tools, spawner, and UI meant the system could not evolve to support remote workspaces or other workspace configurations. Making the DB the source of truth and performing projection at activation time (not at hire time) lets future work extend to remote/ephemeral environments without changing tool or bootstrap logic. Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>

…ahead Add MCP prompts — server-defined slash commands gated by tool grants: - New prompts package: Prompt interface, Registry (mirrors tools.Registry), and builtins (Role and Help). - /help: Self-introspecting command that walks the registry at render time and produces a markdown list of every other prompt. Adding a new prompt automatically lights it up in /help without touching this file. - /role: Drafts a new Role from a title hint, expands to full interview template, saves via create_role, then offers edits or chains to hire_worker. - Server-side expansion in chat bridge: SendHandler intercepts inputs starting with /,expands them from template before sending to claude. User sees original input in their bubble. - Chat typeahead: CommandsHandler (POST /ui/chat/commands) renders matching prompts as HTML buttons on every keyup. Clicking fills the textarea and focuses it. - Enum schema constraints: WorkerKind and TransportKind now surface as enums in JSON Schema so MCP clients see valid values in tool input autocomplete. - Self-documenting validation: WorkerKind.Validate() formats errors as 'unknown worker kind "foo" (valid: "human", "ai")' so clients can self-correct without reading source. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

… polling, and tool visibility Major changes: - **Prevent cascading AI-worker activations**: Added SourceKind classifier (human/ai) to Trigger; workers now deprioritize or skip AI-origin events per agent.md discipline rules. Dispatcher skips self-reactivation on publish. Tests pin self-skip and source_kind behavior. - **Fix SSE newline rendering**: Split markdown fragments across multiple `data:` lines (SSE spec compliant) instead of collapsing newlines. Browser's EventSource rejoins with \n, preserving fenced code blocks and list formatting. - **Add markdown rendering**: Integrated goldmark for safe HTML rendering of Role/Activity text. Added .md CSS class for styling (lists, code, links, headers, blockquotes). Goldmark runs in safe mode; raw HTML is omitted (not escaped). Tests verify bold/lists/code/headings render and <script> tags are dropped. - **Real-time polling UI**: Added htmx polling (every 5s) to org chart, streams list, and events feed. Fixed htmx attribute inheritance breaking child click handlers by adding hx-disinherit="*" on poll parents. Implemented unified all-streams firehose when no stream selected. - **Tool grant visibility**: Org detail now shows each Worker's granted tools as alphabetically- sorted chip badges. Schema exposes MCP tool names; UI surfaces them without requiring a separate tools query. - **System prompt templates**: Moved agent.md and owner_role.md to embedded templates so content can be edited via /ui/org and doesn't require code changes. Agent.md teaches AI workers that human constraints don't apply and defaults to action. Owner role teaches delegation, polling pattern, and stream subscription during hiring. - **Hiring playbook refinement**: Updated role template to instruct on stream provisioning: list_streams → create if missing → subscribe. Emphasized "Worker without streams is half-hired." - **Title selection priority**: Sessions now track separate ai-title events and prefer them over user input for recents display (custom > ai-generated > fallback). - **Model/effort defaults**: Changed claude.model default to "sonnet" for cost predictability; added claude.effort default "low" to minimize extended-thinking budget. Both configurable via registry. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

… docs - Update 'make run' to automatically invoke 'helix-org serve' with sensible defaults (./envs, ./helix-org.db, :8080) rather than bare 'go run' - Enhance 'make clean' to kill running servers and remove local state (DB, envs) in one command - Improve CLAUDE.md to document these defaults and explain when/why to use each target - Clarify that ad-hoc 'go' commands should be avoided in favor of make targets to ensure consistent build/test environment Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>

The dispatcher now coalesces events that arrive while an activation is running, passing them to the Spawner as a single batched []Trigger instead of spawning N separate claude processes. This collapses webhook cascades (e.g. five GitHub events from a worker's own action against a shared auth token) into one follow-up activation. Implementation: - Spawner signature: trigger -> []Trigger - Dispatcher: per-worker queue (pending slice + running flag) replaces per-worker mutex. enqueue() appends and starts runner if needed; run() drains queue in a loop until empty, calling spawner once per drain with the accumulated batch. - buildPrompt() renders multiple triggers as [1/N], [2/N], etc. when there's more than one, so agents see them as a numbered list. - New test proves coalescing: block first activation, publish 3 more events, release -> expect [e-1] then [e-2, e-3, e-4], not 5 separate. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

The github-engineer demo includes: - Full README with prerequisites, setup steps, and teardown instructions - Runnable end-to-end example of a software engineer worker on GitHub - Role documentation for handling task lifecycle, review feedback, and board state Updates to prerequisites: - Document required gh token scopes (project, read:project) - Document port availability requirement for helix-org server - Add instructions for creating and linking a GitHub Project v2 board Updates to software-engineer role: - Add dm tool to MCP surface (was: subscribe, read_events) - Add constraint: escalate setup-level problems to owner via DM instead of failing silently (covers: gh auth issues, missing board, repo unreachable, missing tools, discovery failure) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…ession reuse End-to-end working chat + dispatcher → Helix zed_external desktops with the org-graph MCP attached. Each Worker (human or AI) gets its own project + agent app + git repo at hire time; new activations reuse the same long-lived chat session so follow-ups complete in seconds instead of paying a 3-minute cold-start every turn. Key fixes that came out of debugging against app.helix.ml: - HelixProjectApplier creates a Helix-internal git repo, seeds it with a README so `main` exists, creates the `helix-specs` branch, and pushes role/identity to `workers/<id>/.context/` on that branch. The desktop's startup script then materialises the helix-specs worktree at `~/work/helix-specs/` automatically. - Project-apply does NOT auto-create a repo; without one the desktop's startup script bails with "No repositories were cloned successfully" and Zed never launches. - StartChatRequest now sends `app_id` so `session.ParentApp` is set — Helix's external MCP proxy bails with "session has no associated agent" otherwise, and Zed never sees the helix MCP. - StartChatRequest sends `organization_id` (Helix doesn't auto-populate it from project_id; without it desktop quota falls back to the personal-org limit of 2). - Streaming-aware StartChatWithStatus: reads the SSE response, returns the session ID + a flag indicating whether the WS-not-ready race fired. Detached upstream context so the request survives past the caller's request ctx closing. - warmupAndRetry (chat bridge) and warmupSession (spawner) re-POST the same prompt every 8–20s until the dispatch lands. Helix's waitForExternalAgentReady checks connections globally, so the wait passes immediately when other users have desktops up; the per-session sendCommand then fails fast and Helix marks the interaction error (auto-wake won't recover state=error). The retry pattern absorbs the race client-side. - Spawner reuses worker.HelixSessionID() across activations. Each fresh session spawns a fresh container; reuse keeps it warm. - Owner-role hiring playbook updated: hire_worker MUST include `grants` matching the Role's Tools section. The MCP tool list is frozen by Helix's external-MCP-proxy cache for the lifetime of the first session, so granting later means the Worker can't see the tools until session restart. - Runtime switched from claude_code → zed_agent. claude_code talks directly to Anthropic and needs an API key wired into the container (which we don't); zed_agent routes inference back through Helix. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…n agent Live role-edit (update_role) now propagates to running Workers without requiring a session restart: - HelixProjectApplier.Ensure no longer early-returns on the fast path before pushing files. The expensive ApplyProject / CreateGitRepo / AttachRepo steps still skip when the project exists, but agent.md / role.md / identity.md are re-pushed to the helix-specs branch on every Ensure call. CreateBranch and PutFile are idempotent and cheap, so the cost is two HTTP calls per activation. - Spawner activation prompt (helixSpecsMandate) now ALWAYS runs `git pull --ff-only origin helix-specs` at the start of every activation (fall-through to `git worktree add` only when the worktree is missing). Without this, the agent reads the worktree's stale on-disk copy and the new role text never takes effect. - Activation prompt now also reads `.context/agent.md` first as the org-wide entrypoint, then role.md, then identity.md. - AgentMD threaded through HelixSpawnerConfig and HelixProjectApplier so the spawner+chat-backend both seed the org policy on apply. Validated end-to-end via demos/getting-started: publish hello → echo: hello (initial role) update_role r-echo → "loud: <BODY UPPERCASED>" publish hello → loud: HELLO ← live-edit takes effect Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…d channel discipline - Add **On anything else. Stay quiet** block (required in every Role) to establish default behavior: don't post unless a trigger above matches and output is something a human asked for. - Require explicit output channel per trigger (`Post to s-{channel}` or "no post"). - Add constraint requiring workers to name the trigger before acting, enabling audit-log inspection and forcing commitment to a frame. - Clarify drafting instructions so LLM-generated Roles include these elements. This addresses the "chatty colleague" failure mode at the template level: models now have explicit permission boundaries and must name their reasoning. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

- Move Helix-specific Worker fields off domain.Worker into a sidecar WorkerRuntimeState store keyed on (workerID, backend, key). Drops six methods from the domain interface and isolates per-runtime pointers behind typed helpers in agent/helix/state.go. - Move the runtime layer out of tools/: new agent/, agent/claude/, agent/helix/ packages plus helix/helixclient/ (was tools/helixclient/). tools/ now holds only org-graph MCP tools and Deps. - Rename SpecsPublisher -> agent.WorkspaceSync. Logical-name contract ("role.md", "identity.md"); each backend translates to its own layout (claude: <envsDir>/<wid>/<name>; helix: workers/<wid>/.context/<name>). Fixes the prior path mismatch where update_role wrote job/* but the activation mandate read .context/*. - Move agent.md from tools/templates/ to agent/policy.md and embed as agent.Policy so both runtimes share one source. - Unify session shape: helix.Runtime ("zed_agent") and helix.AgentType ("zed_external") are non-configurable constants used by every project apply and every /sessions/chat post. Drops chat.agent_type config key and the SpawnerConfig.Runtime / ProjectApplier.Runtime fields so the spawner and chat backend can no longer drift to claude_code. Verified end-to-end against app.helix.ml: getting-started demo (hire echo, publish hello, echo: hello, live update_role, loud: HELLO). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

New demo: operator raises NCR on shop floor → agent fans out to supervisor (Slack), customers (SMS), supplier (email held) → supervisor approves containment in one DM → agent confirms and kills/sends supplier email based on approval text. Shows the hold pattern and the split between agent (glue) and human (decisions). Verified end-to-end against app.helix.ml with comms-demo container. Three channels (email/slack/sms), two activations, ~90 seconds on stage. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

The chat agent was creating s-ncr-raised with the default transport (local) because the hire prompt said "no config" — leaving it ambiguous whether the transport itself was needed. Symptom on stage: POST /webhooks/s-ncr-raised → 404 "is not a webhook stream". Three changes: - Hire prompt now spells out the create_stream JSON for every stream and explicitly says do not omit the transport field. - Adds a smoke-test curl after hire that fails fast if any stream is misconfigured. - Adds the local-transport failure mode to the Recovery table with the verbatim fix. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

The chat agent kept guessing wrong on transport.kind ("webhook", "incoming-webhook", and {"kind":"webhook","direction":"in"}) because the JSON schema exposed kind as a plain string with no enum and no description. We already had a TransportKind enum surfacer wired up in tools/schema.go — but createStreamTransport.Kind was typed as string, not domain.TransportKind, so the enrichment never applied to this schema. - Retype createStreamTransport.Kind to domain.TransportKind so the existing enum-and-description enrichment kicks in. - Beef up the tool's Description with the valid kinds and a webhook example for clients that don't render enum constraints. Verified: schema now exposes enum: ["local", "webhook", "email", "github"] and bad kinds are rejected with the existing self-documenting error ("valid: \"local\", \"webhook\", \"email\", \"github\""). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

create_stream's schema now surfaces transport.kind as enum: ["local","webhook","email","github"] with a description, so the hire prompt no longer has to defend against the agent guessing "incoming-webhook" or omitting the transport entirely. - Trim the "do not omit transport" guardrail and the post-hire get_stream verification step — both were workarounds for the schema gap, now closed. - Add a note to always pass `chat --new` after rebuilding the binary; chat-driving claude caches MCP tool schemas at session start and won't see new enum constraints without a fresh session. - Soften (don't remove) the local-default Recovery row: stale chat sessions on a fresh binary can still hit it. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

The schema now exposes the valid transport kinds, so the prompt no longer needs literal JSON arguments — describing the streams in words is enough for the agent to call create_stream correctly. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Smaller chat models reliably collapse the canonical {"transport":{"kind":"webhook"}} object to its discriminator string {"transport":"webhook"} once they've seen the kind enum on the schema, then watch the call fail with a JSON-unmarshal error and loop. Both shapes are unambiguous and mean the same thing — accept both. - Custom UnmarshalJSON on createStreamTransport handles either form. - Schema declares transport as a oneOf [enum-string, object] so strict-validating MCP clients accept the shorthand too. - Tests cover both input forms and the schema shape. Verified live: create_stream with transport:"webhook" produces a stream with transportKind:"webhook"; the object form still works. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

The chat backend (chat.backend=helix) runs the chat-driving agent inside a Helix sandbox that does NOT have this repo checked out. Telling it to "read ./demos/manufacturing/roles/quality-bot.md" is a dead instruction — the file isn't there. The Zed agent then spirals through every other tool it has trying to find context: kodit_repositories, kodit_wiki, kodit_grep, curl on localhost:9876, ls on the helix-specs branch, etc. Fix: paste the entire role markdown inline in the hire prompt so the agent has zero reason to fetch anything from the filesystem. Add explicit "Use ONLY the helix-org MCP tools, do NOT read files, do NOT use kodit, do NOT curl URLs" steering. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

The pointer schema arrived as Types:["object","null"]; setting Type without clearing Types produced an invalid jsonschema (both Type and Types non-zero is a marshal error), which broke MCP tools/list at session start and starved Claude of every helix-org tool. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

The bare `helix` pattern matched any directory named `helix` at any depth, which was silently swallowing helix-org/helix/ and helix-org/agent/helix/ — entire packages (helixclient, spawner, project applier, runtime state, workspace) sitting in the working tree but never reaching git. The original intent was to ignore the `helix` binary at known cmd paths; anchor it there so the helix-org subtree becomes trackable. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…vider Adopts #2375 (the durable session-scoped message queue) and #2399 (cold-start dev-container wake) so helix-org no longer needs to fight the framework with a client-side warmup loop. ## helix client - New `SendSessionMessage(ctx, sid, content, opts)` posts to /api/v1/sessions/{id}/messages — Helix persists the interaction and pickupWaitingInteraction delivers it once the agent's WS is reachable. Returns 200 even when no agent is connected. - New `ListProviders` and `ListModelsForProvider`, plus a `ValidateProviderModel` helper that checks chat.provider / chat.model against the live Helix instance. We hit /v1/models with the provider query string (the bare aggregate endpoint excludes Anthropic and is unreliable). ## Spawner refactor (agent/helix/spawner.go) - Follow-up activations queue via `SendSessionMessage` — no StartChat round-trip. 290ms instead of 7s+ on a warm session. - First activations still use `StartChat` to create the session; on the cold-start `hadWSError` race we re-queue the same prompt via the durable endpoint instead of polling for up to 5 minutes. - Drops `warmupSession` (~40 lines). - New tests: `TestSpawnerFollowUpUsesSendSessionMessage` (asserts no StartChat on follow-up) and `TestSpawnerColdStartReQueues` (asserts the hadWSError → queue handoff). ## Chat-bridge refactor (server/chat/helix_bridge.go) - Same two-path treatment: follow-ups via `SendSessionMessage`, fresh sessions via `StartChat` with cold-start fallback to the queue. - Drops `warmupAndRetry` and the 5-minute background goroutine (~70 lines). - Existing test updated to assert follow-ups go through the queue. ## Provider/model validation - `bootstrap helix-runtime` now runs the validator after WhoAmI and prints the actual providers/models on failure. - `serve` refuses to start with bad chat.provider / chat.model and points operators at the exact config commands to fix it. Without this, a typo in chat.provider surfaces as a 422 from /sessions/{id}/zed-config three minutes later when the desktop tries to fetch its Zed config — with no obvious link back to the bad key. The validator turns that into a fail-fast at startup. ## Verified end-to-end against meta.helix.ml Final smoke session: ses_01kr9bcpcm9gnpr7k5y4fgjmdk - First send → StartChat (~31s for Zed cold boot) → "pong" - Follow-up → SendSessionMessage (347ms to queue) → response within ~10s Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Adds CheckDesktopQuota helper that hits /api/v1/config and refuses when max_concurrent_desktops would be exceeded by spinning up one more session. Wired into both code paths that open *new* zed_external sessions: - agent/helix/spawner.go::ensureSession (AI Worker activations) - server/chat/helix_bridge.go::send (owner chat first turn) Follow-ups skip the check — they reuse the warm container and don't allocate a new desktop slot. Without this, a quota-full Helix would let helix-org spin up the per- Worker project plumbing (apply secrets, attach MCP, create agent app) and only fail at the StartDesktop step with a generic 500 several seconds later. The new error message names the actual count and points operators at the fix: desktop quota reached on Helix (3/2 active) — stop one of the existing sessions before opening a new one The check is soft (no atomic reserve) — a parallel caller could still race for the last slot, in which case Helix's own quota error wins. That's acceptable; the goal is operator clarity in the common single- user case. Verified end-to-end against meta.helix.ml: with active=3/max=2, send returned 500 + actionable message in 289ms; after stopping two sessions (active=1), the same request opened a session in 7s. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

philwinder force-pushed the feat/helix-org-prompt-driven-mcp branch 2 times, most recently from d9a9c99 to 01e9388 Compare April 27, 2026 13:23

philwinder changed the title ~~feat: helix-org prototype with MCP, prompt-driven CLI, and Role/Identity split~~ feat: helix-org prototype with MCP, prompt-driven CLI, transports (webhook/email/github), and Role/Identity split Apr 28, 2026

philwinder and others added 27 commits May 4, 2026 11:43

philwinder force-pushed the feat/helix-org-prompt-driven-mcp branch from 2386a28 to 284d86b Compare May 4, 2026 09:43

philwinder and others added 13 commits May 6, 2026 17:22

demo fix

0548792

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: helix-org prototype with MCP, prompt-driven CLI, transports (webhook/email/github), and Role/Identity split#2286

feat: helix-org prototype with MCP, prompt-driven CLI, transports (webhook/email/github), and Role/Identity split#2286
philwinder wants to merge 41 commits into
mainfrom
feat/helix-org-prompt-driven-mcp

philwinder commented Apr 25, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

philwinder commented Apr 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Core platform

Transports

Operational config

Design Philosophy

What's Inside

Testing

Next Steps (Post-WIP)

Update — domain/runtime split + unified Helix session shape

Demos

Notes for reviewers

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

philwinder commented Apr 25, 2026 •

edited

Loading